NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DRLComplex: Reconstruction of Protein Quaternary Structures Using Deep Reinforcement Learning

Soltanikazemi, Elham; Roy, Raj; Quadir, Farhan; Giri, Nabin; Morehead, Alex; and Cheng, JIanlin (July 2023, The International Conference on Intelligent Biology and Medicine (ICIBM))
Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps

https://doi.org/10.1186/s12859-022-04829-1

Mahmud, Sajid; Guo, Zhiye; Quadir, Farhan; Liu, Jian; Cheng, Jianlin (December 2022, BMC Bioinformatics)

The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method.
more » « less
Full Text Available
DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

https://doi.org/10.1038/s41598-021-91827-7

Quadir, Farhan; Roy, Raj S.; Halfmann, Randal; Cheng, Jianlin (December 2021, Scientific Reports)

Abstract Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.
more » « less
Full Text Available
A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers

https://doi.org/10.1093/bioinformatics/btac063

Roy, Raj S.; Quadir, Farhan; Soltanikazemi, Elham; Cheng, Jianlin; Xu, ed., Jinbo (February 2022, Bioinformatics)

Abstract MotivationDeep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue–residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue–residue contacts in homodimers from residue–residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue–residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. ResultsTested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset and CASP-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.10% and 33.50% respectively at 6 Å contact threshold, which is substantially better than DeepHomo and DNCON2_inter and similar to Glinter. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs well, even though its accuracy is lower than using true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers. Availability and implementationThe source code of DRCon is available at https://github.com/jianlin-cheng/DRCon. The datasets are available at https://zenodo.org/record/5998532#.YgF70vXMKsB. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
DeepComplex: A Web Server of Predicting Protein Complex Structures by Deep Learning Inter-chain Contact Prediction and Distance-Based Modelling

https://doi.org/10.3389/fmolb.2021.716973

Quadir, Farhan; Roy, Raj S.; Soltanikazemi, Elham; Cheng, Jianlin (August 2021, Frontiers in Molecular Biosciences)

Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at http://tulip.rnet.missouri.edu/deepcomplex/web_index.html
more » « less
Full Text Available
High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function

https://doi.org/10.1109/MLHPC54614.2021.00010

Gao, Mu; Lund-Andersen, Peik; Morehead, Alex; Mahmud, Sajid; Chen, Chen; Chen, Xiao; Giri, Nabin; Roy, Raj S.; Quadir, Farhan; Effler, T. Chad; et al (November 2021, IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC))

Full Text Available
Prediction of protein assemblies, the next frontier: The CASP14‐CAPRI experiment

https://doi.org/10.1002/prot.26222

Lensink, Marc F.; Brysbaert, Guillaume; Mauri, Théo; Nadzirin, Nurul; Velankar, Sameer; Chaleil, Raphael A.; Clarence, Tereza; Bates, Paul A.; Kong, Ren; Liu, Bin; et al (December 2021, Proteins: Structure, Function, and Bioinformatics)

Full Text Available
Impact of AlphaFold on structure prediction of protein complexes: The CASP15‐CAPRI experiment

https://doi.org/10.1002/prot.26609

Lensink, Marc_F; Brysbaert, Guillaume; Raouraoua, Nessim; Bates, Paul_A; Giulini, Marco; Honorato, Rodrigo_V; van_Noort, Charlotte; Teixeira, Joao_M_C; Bonvin, Alexandre_M_J_J; Kong, Ren; et al (October 2023, Proteins: Structure, Function, and Bioinformatics)

Abstract We present the results for CAPRI Round 54, the 5th joint CASP‐CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo‐trimers, 13 heterodimers including 3 antibody–antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High‐quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2‐Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2‐Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.
more » « less

Search for: All records